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I.  INTRODUCTION 


Recent  technological  developments  have  resulted  in  a  wide 
variety  of  imaging  systems  and  subsystems.  The  flexibility 
and  technologies  available  to  the  designer  include  various 
means  for  collection,  coding,  transmitting,  decoding,  analog 
and  digital  processing,  and  analog  and  digital  display.  The 
applications  of  such  systems  and  subsystems  are  myriad, 
ranging  from  static  and  dynamic  military  photointerpretive 
functions,  “brough  commercial  and  closed-circuit  television 
and  facsimile  systems,  to  diagnostic  radiological 
instrumentation  and  earth  resources  appl ications.  The 
scientific  world  is  quite  familiar  with  some  of  the 
techniques  which  can  be  used  to  "improve"  the  nature  of  any 
such  image,  and  the  non-scienti f ic  world  has  equally  seen 
examples  of  such  processing  effectiveness,  such  as  the 
Zapruder  and  Hughes  films  of  the  Kennedy  assassination.  In 
many  cases,  it  is  clear  that  such  processing  and  display 
techniques  can  extract  information  in  the  original  image 
which  is  otherwise  well  below  the  threshold  capacity  of  the 
human  visual  system,  whereas  in  other  cases  it  is  clear  that 
processing  techniques  can  often  serve  either  to  hide 
existing,  and  important,  image  detail  or  to  "create"  image 
detail  which  is  perhaps  not  present  in  the  original  image  or 


1 


in  the  “real  world".  Heretofore ,  most  of  these  areas  of 
image  system  and  subsystem  development  have  plainly  suffered 
from  their  inattention  to  human  observer  requirements.  This 
is  particularly  true  of  the  extensive  effort  in  digital 
image  processing,  especially  that  part  devoted  to  tiie 
improvement  ( “enhancement" ,  "restoration")  of  images  for 
puipises  of  human  information  extraction.  In  nearly  all  of 
the  work  performed  in  laboratories  around  the  country  that 
are  pursuing  this  type  of  research,  the  necessary  evaluative 
efforts  to  determine  the  utility  of  processing  and  display 
techniques  have  not  been  conducted.  Rather,  reports  and 
publications  of  this  work  typically  take  the  form  of  "before 
and  after"  pair:-.  >f  images,  where  the  reader  is  left  to 
estimate  the  utility  of  su'h  images  either  by  visual 
inspection  of  these  published  <se.*anl-  or  third-generation) 
photo fraphs  >r  by  the  sob je ‘five  opinions  offered  in  the 
r  .-x  t  by  t  he  i  or  ho  r  . 

He  *  lose  the  intent  if  su>*h  image  processing  techniques  is 
ti  improve  the  i  .Vorm.it  ion  extraction  capabilities  of  the 
human  observer,  i ’  is  clearly  appropriate  and  mandatory  that 
e  v  1 1  u  i  ’  i  v  e  fechti  t  que  s  include  objective  measurement  of  human 
inf  irn-ui  in  extraction  from  such  images,  in  addition  to 
subjective  estimates  of  the  overall  quality  or  utility  of 
the  image.  Unfortunately,  the  human  factors  experiments 
required  to  produce  quantitative  and  objective  assessment  of 
im»oe  quality  have  rarely  been  conducted  in  image  processing 


1  .i bo r a t o r  i e s  or 


in  conjunct  ion  witK 


image  processing 


programs . 

In  view  of  the  many  millions  of  dollars  being  devoted  to 
image  collection,  processing,  and  display  systems  for  the 
military  and  civilian  use  of  digitized  images,  it  is  quite 
clear  that  an  assessment  program  is  urgently  needed  to 
devise  procedures,  techniques,  and  metrics  of  digital  image 
quality.  Such  a  progran  requires  the  establishment  of  a 
standardized  set  of  procedures  for  obtaining  human  observer 
information  extraction  performance;  relating  this 
performance,  in  a  quantitative  manner,  to  the  various 
collection,  processing,  and  display  techniques  and 
iljorirhms;  and  devising  a  quantitative  relationship  for  the 
mul t i -d imensiona 1  scaling  of  the  various  collection, 
orocessinq,  and  display  techniques  in  "performance  space". 

Only  through  such  an  integrited  program  of  research  can 
the  system  and  subsystem  designer  have  meaningful  data  for 
cost-benefit  analyses  of  future  system  development,  be  such 
systems  intended  either  for  military  or  for  non-military 
a ppl  i ca t i ons .  The  image  collection,  processing,  and  display 
technology  is  ntw  at  a  paint  whereby  such  evaluative 
research  is  sorely  needed.  Fortunately,  m  i  c  r  opho  tome  t.  r  l  c  , 
m  i  c rodens i tome t r i c ,  and  human  performance  measurement 
techniques  have  been  evolved  during  the  past  several  years 
to  relate  human  information  extraction  performance  to  the 
various  physical  character i st ics  of  both  e 1 ec t r o-opt i cal  and 


photograph  io  unag  <■  displays.  The  present  research  program 
is  designed  to  extend  these  recently  developed  techniques 
inti  the  ir*na  of  digital  images,  emphasizing  derivation  of 
metrics  of  image  quality  appropriate  to  digitized  images, 
ml  providing  quiicint  iv<*  cost -bene f 1 t  data  whi:h  will 
permit  'he  dest|re*r  and  system  developer  to  plan  his 
developmental  eff>rt  as  well  as  to  sperify  optimum  system 
components  for  parti  malar  image  acquisition  and  display 
requi ren«nt s . 


overview  of  the  he.skah-h  plan 

The  research  plan  is  lail  out  schematically  in  Figure  1. 
Each  small,  solid-lined  h ox ,  with  the  except  ion  of  the 
uppermost,  indicates  a  separate  ti3k  to  lac  conducted  during 
the  course  of  the  four-year  effort.  The  two  large,  broken- 
lined  boxes  delineate  the  specific  display  formats  that  will 
be  studied  and  compared  during  this  initial  program:  black 
and  white  hard-copy  transparencies  and  electronic  displays. 
The  small,  broken-lined  box  at  the  bottom  illustrates 
important  extensions  of  this  research  to  be  pursued  in  the 
future,  namely  interactive  digital  displays  in  both  black 
and  white  and  full  color.  The  present  report  describes  in 


detail  the  hard-copy  subjective  scaling  experiment. 
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F i gur  e 


Schematic  diagram  of  proposed  research 


RESEARCH  OBJECTIVES 

The  ivenll  research  objectives  of  this  program  arc  as 
it 1 lows : 

1.  Develop  standardized  procedures  and  techniques  to 
evaluate  hard-copy  (film)  and  soft-copy  (CRT)  digital 
image  qua  1 i ty . 

?.  Compara  candidate  physical  metrics  of  image  quality. 

3.  Compare  hard-copy  wi th  soft-copy  displays  for  image 
interpretation. 

<.  Evaluate  candidate  processing,  enhancement,  and 
restoration  algorithms  for  improvement  of  image 
interpretation  on  soft-copy  displays. 
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SPECIFIC  RESEARCH  TASKS 

In  keeping  with  the  general  goals  described  above,  the 
specific  research  tasks  are  as  follows: 

1.  Develop  an  imagery  database  and  image  interpretation 
scenarios  from  high  quality  aerial  photography 
relevant  to  the  image  i nterpretation  task. 

2.  Select  and  purchase  display  and  interface  hardware  to 
present  the  image  database  on  soft-copy  (CRT) 
displays. 

3.  Develop  image  manipulation  software  for  soft-copy  and 
hard-copy  experiments. 

4.  Develop  and  standardize  observer  data  collection 
procedures  for  hard-copy  and  soft-copy  experiments. 

5.  Develop  and  standardize  procedures  for  obtaining 
physical  image  metrics  from  hard-copy  and  soft-copy 
displays. 

A.  Digitize  and  degrade  database  imagery  and  record 
images  on  hard-copy  and  magnetic  tapes  for  soft-copy 
display. 

7.  Obtain  physical  image  metric  data  for  hard-copy  and 
soft-copy  displays. 

8.  Conduct  subjective  quality  scaling  and  information 
extraction  studies  on  hard-copy  images. 

9.  Conduct  subjective  scaling  and  information  extraction 
studies  on  soft-copy  displays. 
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10.  Evaluate  the  utility  of  image  quality  metrics  for 
both  hard-copy  and  soft-copy  imagery. 

11.  Conduct  subjective  scaling  and  information  extraction 
studies  on  processed  soft-copy  imagery. 

12.  Analyze  the  utility  of  image  quality  metrics  for 
processed  soft-copy  imagery. 

13.  Compare  image  quality  metrics  for  hard-copy  and  soft- 

copy  (processed  and  nonpr ocessed )  images.  Relate 

these  results  to  concepts  and  models  of  human  visual 
performance  and  to  imaging  system  design  variables. 

This  present  report  relates  to  Objective  8  above.  It 
describes  the  results  of  that  part  of  program  dealing  with 
subjective  scaling  of  the  hard-copy  imagery.  It  also 
addresses  the  question  of  how  subjective  image  quality  is 
affected  by  measurable  physical  properties  of  digitally 
derived  imagery.  Spec i f ica 1 1 y ,  trained  photointerpreters 
performed  a  subjective  scaling  task  using  images  which  were 
degraded  by  two  known  physical  characteristics  common  to 
digitized  aerial  imagery,  blur  and  noise.  A  parallel 
experiment  assessing  information  extraction  performance  with 
the  same  images  is  reported  by  Snyder,  Turpin,  and  Maddox 
(1981)  . 

In  addition  to  obtaining  these  important  baseline  scaling 
data,  the  experiment  also  served  to  evaluate  the  scaling 
methodology  to  be  used  in  the  subsequent  soft-copy  phases  of 
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the  research  program.  Objectives  of  this  methodology  are 
described  later. 


\ 

BACKGROUND 

Often,  Pis  are  not  able  to  inspect  in  detail  all  images 
brought  to  them.  Hence,  the  photographs  that  are  used  are 
those  that  possess  high  content  and  technical  quality.  Even 
in  this  age  of  high  speed  computer  technology,  the 
calculable  physical  metrics  are  not  completely  reliable  in 
determining  which  photographs  will  provide  the  most 
information.  Thus,  Pis  typically  make  a  cursory  inspection 
of  imagery  to  determine  which  frames  they  feel  are  most 
interpretable.  Pis  often  use  a  standard  scale  (e.g.,  NATO 
scale)  as  a  reference  for  selection/rejection  decisions, 
although  the  literature  contains  other  scales  and 
experiments  that  purport  to  measure  subjective  image 
quality,  and  some  relate  noise  and  blur  content  to 
subjective  image  quality  for  nondigital  imagery.  Some  of 
this  more  pertinent  literature  is  summarized  below. 

For  the  most  part,  this  present  research  is  unique  in  its 
use  of  digital  images.  Few  laboratories  have  the  facilities 
and  personnel  to  play  an  active  role  in  the  generation  of 
digital  images  and  evaluate  the  effects  of  degrading  them. 
Perhaps  the  most  advanced  research  in  this  area  is  conducted 
by  tiie  government  laborator  ies,  although  few  studies  of 
government  Pis  have  been  reported  in  the  literature. 


8 


Noise  Effects 


For  example,  several  studies  have  investigated  the 
effects  of  random  noise  upon  subjective  assessments  of  imago 
quality  (Allnatt  and  Prosser,  1966;  Below,  Huer tas-Gend ra , 
Fritze,  and  Samrau,  1963;  Geddes,  1963;  Newell  and  Geddes, 
1963;  Prosser  and  Allnatt,  1965;  Weaver,  1959).  All  of 
these  studies  involved  raster-scan  television  images.  The 
primary  aim  of  these  studies  was  to  develop  standards  for 
commercial  television  broadcasting.  Without  exception, 
descriptive  statistics  were  the  only  results  reported. 
Variations,  other  than  noise  levels,  in  the  experimental 
conditions  across  these  studies  include  viewing  ratio,  the 
type  of  observers,  the  number  of  raster  lines  (405-625), 
average  luminance,  ambient  illuminance,  color  versus  black 
and  white  presentation,  noise  frequency  and  waveform,  and 
the  subjective  scale  used  for  rating  the  images.  The  type 
of  scales  used  by  these  investigators  is  called  a  grading 
scale  (Prosser,  Allnatt,  and  Lewis,  1964).  The  scales 
consist  of  points  (typically  1-7)  with  descriptive 
adjectives  corresponding  to  each  point.  Some  of  the  scales 
measure  the  amount  of  image  impairment  while  others  measure 
the  subjective  quality  of  the  image.  In  these  studies,  each 
observer  assigned  a  number  to  an  image,  a  technique  known  as 
direct  magnitude  estimation  (Stevens,  1975). 

The  authors'  conclusions  from  these  studies  are 
consistent:  noise  degrades  image  quality.  It  is 
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infeasible  to  eliminate  all  noise  from  a  television  image. 
However,  in  developing  television  standards,  mark  points 
have  been  reported.  A  mark  point  refers  to  the  noise  level 
that  intersects  a  particular  subjective  score.  These  points 
can  be  determined  most  easily  by  graphic  methods.  For  a 
scale  with  five  points  (bad,  poor,  fair,  good,  excellent, 
one-five,  respectively),  a  noise  level  is  associated  with 
each  scale  category.  Therefore,  a  standard  might  he  based 
on  the  noise  level  (or  signal-to-no i se  ratio)  yielding  a 
rating  of  four  or  good.  In  general,  the  middle  scale 
category  (average  quality,  good  to  poor,  marginal,  obviously 
impaired  but  not  objectionable)  has  been  associated  with  a 
range  of  signal-to-no  ise  ratios  (in  dB)  of  23  to  35  or  an 
average  of  about  30.  A  s ignal - to-no i se  ratio  of  30  dB  is 
equivalent  to  a  signal -to-noise  ratio  of  32:1. 

While  these  studies  of  the  effects  of  noise  provide  some 
data  for  comparison  with  the  present  research,  the  problems 
inherent  in  doing  so  should  be  mentioned.  Television 
displayed  images  are  quite  different  from  digitized 
photographs.  It  would  be  foolhardy  to  expect  great 
similarity  in  the  results.  Further,  in  all  of  these 
studies,  signal-to-noise  ratios  were  calculated  at  the  input 
to  the  video  system.  The  actual  viewed  signal-to-noise 
ratios  should  be  measured  at  the  display  by  photometric 
means . 
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Mosel  1  ami  Willson  (197  1)  have  done  much  to  forward  tie 


study  of  noise  and  its  role  in  target  detection.  Their 
experiments  have  involved  many  aspects  of  visual 
performance.  Much  of  the  preliminary  psychophysical  work  by 
these  authors  involved  the  detection  of  displayed  stripes 
(called  bar  patterns)  on  photographs  mixed  with  noise  and 
displayed  on  a  television  monitor.  After  determining 
threshold  s ignal -to-no i se  levels  in  bar  pattern  detection 
tasks,  experiments  were  conducted  in  recognizing  and 
identifying  military  targets.  These  threshold  data  were 
then  compared  with  the  bar  pattern  studies  .and  it  was  found 
that  the  results  of  the  bar  studies  could  predict 
recognition/identification  performance,  and  vice  versa.  By 
applying  a  scale  factor  of  eight  for  bar  pattern  detection 
data  and  matching  bar  area  to  target  area,  the  authors 
reported  a  high  degree  of  similarity  in  the  data.  No 
statistics  were  reported. 

Finally,  Humes  and  Bauerschmidt  (1968)  took  a  different 
approach  to  the  study  of  noise  degraded  imagery.  Using  the 
method  of  limits  and  five  standard  signal -to-no i se  ratios 
from  2  to  29.1  dB,  their  judges  reported  whether  or  not  the 
test  images  were  equal  to  or  more  or  less  "noisy"  than  the 
standards.  The  29  images  used  in  their  study  had  previously 
been  used  in  a  target  recognition  study  in  which  the 
standard  s ignal -to-no i se  ratios  were  effective  in  showing  a 
performance  difference.  The  psychophysical  study  found  an 
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uncertainty  range  of  1  dB  at  the  A  dB  ;rindard  and  about  /.  ! 
dB  at  the  29.1  dB  standard.  In  particular,  this 
relationship  between  range  of  uncertainty  and  the  standard 
signal -to-noi se  ratio  was  linear  above  the  7  dB  standard. 

Blur  Ef  f  eo  ts 

Blur  has  not  been  studied  as  extensively  as  noise  with 
r aspect  to  subjective  scaling.  In  fact,  one  must  draw  upon 
related  research  in  order  to  gain  any  insight  into  the 
effects  of  blur.  Film  degraded  by  blur  is  generally 
discarded  immediately,  unless  new  pictures  cannot  be 
obta i ned  . 

Blur  in  raster-scan  images  is  similar  to  echo  or  delay, 
which  is  manifested  in  "ghost"  images  about  the  desired 
signal.  Several  studies  have  ’lur  jssed  thi~>  type  of 
impairment  (Allnatt  and  Prosser,  19PS‘  Cavanaugh  and 
Lessman,  1971;  Lessnan,  1972;  Weave  998;.  These  studies 
are  very  similar  to  those  of  no i s~  -  i mpa i red  television 
images.  Whereas  noise  is  c! ar _e r i zed  by  frequency  and 
amplitude,  echo  can  be  defined  in  terms  of  the  delay  (spread 
in  time/spaee  about  the  signal)  and  amplitude.  Echo 
impairment  is  typically  expressed  as  a  s i gna 1 - to-echo  ratio, 
the  amplitude  of  the  signal  to  the  amplitude  of  the 
displaced  signal  at  a  particular  point  before  or  after  the 
desired  signal.  Allnatt  and  Prosser  (1995)  and  others  have 
shown  quite  convincingly  that  a  delay  of  two  microseconds  is 
most  >bject ionable  subjectively. 
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The  resul  ts  of  studies  in  this  ar-.-.i  j  r  ■;  in  q<oi.  rol 
agreement.  The  middle  scale  value  is  associated  with  a 
range  of  signal -to-echo  ratios  (in  dB)  from  3  0  to  2  h  wit1!  an 
average  of  about  20  dB.  A  s ignal -to-echo  ratio  of  l<>:  3  is 
equivalent  to  20  dB.  These  numbers  provide  another  me. ins  of 
comparing  an  analog  node  of  presentation  with  the  present 
da  to  . 

Subiective  Scaling  and  Information  Extraction 

Several  studies  of  pho  to- i  n  ter  pr  e  ta  t  i  on  lay  Pis  have  been 
published,  primarily  in  the  technical  report  literature.  A 
few  of  these  studies  have  investigated  the  relationship 
between  scale  values  and  information  extraction  performance 
(brainard,  Sadacca,  Lopez,  and  Ornstein,  I960;  Klingberg, 
Elworth,  and  Filleau,  1970;  Sadacca  and  Schwartz,  1 9 G 3 ) . 
(Information  extraction  performance  data  were  available  for 
a  subset  of  the  images  used  in  the  present  research. 
Performance  data  allowed  for  an  evaluation  of  the 
relationship  between  performance  and  scale  values  in  this 
study .  3 

sadacca  and  Schwartz  (1903)  used  a  ranking  technique  to 
scale  72  images.  Several  performance  measures  were  also 
collected,  including  the  number  of  correct  target 
identifications,  the  number  of  wrong  target  identifications, 
and  on  overall  accuracy  score.  The  ranks  were  correlated 
with  performance  for  each  of  three  scenes  and  the  three 

The  correlations  between  the  ranks 
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performance  measures 


and  the  number  of  correct  identifications  r  mged  iron  .  19  t  > 
.59,  between  ranks  and  incorrect  identifications  from  -.'ll) 
to  .00,  and  between  ranks  anil  overall  accuracy  from  .14  to 
.51,  for  tin;  three  scenes.  It  should  be  noted  that 
different  Pis  were  used  in  the  performance  and  scaling 
s  t  ud  i  i>s. 

Hr »i nurd  e_t  al.  (195<>)  had  Pis  make  relative  j  udijenen  ts 
in  scaling  image  quality.  Kach  of  30  images  (3  scenes)  was 
rated  by  comparing  each  imago  with  a  catalog  of  30  degraded 
images  and  assigning  each  test  image  a  catalog  image  number. 
Two  performance  measures  were  collected:  number  of  correct 
target  identifications  and  number  of  correct  identifications 
of  target  area.  The  same  Pis  participated  in  both  phases  of 
the  experiment.  The  correlations  between  catalog  numbers 
and  target  identification  performance  ranged  from  .59  to 
.70,  and  for  area  identification  from  .55  to  .79. 

Klingberg  et.  al.  (  1970)  used  a  standard  ranking 
procedure  to  scale  32  images.  A  target  identification 
measure  was  again  collected.  Subjective  rank  correlated 
very  highly,  .92,  with  the  performance  measure. 

It  is  obviously  the  case  that  Pis  were  able  to  predict 
the  interpretabi 1 i ty  of  images  through  a  variety  of  scaling 
procedures  in  the  studies  reviewed  here.  The  extent  of  the 
pr ed i ctab i 1 i ty  was  not  consistent. 
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Multidimensional  Scaling 

Tvo  technical  reports  have  been  published  in  which 
multidimensional  scaling  (MDS)  was  applied  to  subjective 
image  quality  (Marmolin  and  Nyberg,  1978;  Sadacca  and 
Schwartz,  196.1).  Sadacca  and  Schwartz  (196  1)  used  1/  images 
varying  in  scene  content,  ground  scale,  sharpness,  and 
contrast.  Twenty-one  Pis  judged  the  similarity,  based  on 
interpretabi  1  i  ty  of  all  pairs  of  images  (66).  Another  ;r  mp 
of  Pis  performer!  in  an  information  extraction  study  with 
this  imagery  database.  The  authors  chose  six  dimensions  for 
their  MDS  spatial  configuration  but  could  only  explain  four 
of  the  six.  The  four  dimensions  were  interpreted  by 
averaging  the  projections  over  the  levels  of  the  inagery 
parameters  and  relating  the  direction  of  these  means  to  the 
logical  subjective  effect  of  the  parameters.  The  means 
indicated  the  following;  dimension  one  was  related  to  ground 
scale,  dimension  two  was  related  to  sharpness,  dimension 
three  to  scene  content,  and  dimension  four  to  contrast.  No 
other  analysis  was  attempted  using  the  projections. 

Marmolin  and  Nyberg  (1978)  took  a  similar  approach  using 
24  degraded  images  and  four  untrained  judges.  Physical 
quality  metrics  were  calculated  for  each  image.  Four 
dimensions  were  chosen  to  represent  the  data.  While  six 
dimensions  produced  minimum  stress,  the  authors  chose  the 
four-dimension  model  because  these  dimensions  could  be 
interpreted.  The  four  dimensions  were  interpreted  from 
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correlations  obtained  between  the  image  parameters 
(including  physical  metrics)  and  the  projections  on  each 
dimension.  The  four  dimensions  were  interpreted  as 
sharpness,  noise,  contrast,  and  the  bandpass  by  contrast  by 
noise  interaction,  respectively.  Since  no  information 
extraction  performance  measures  were  collected,  no  analyses 
relating  the  projections  to  performance  were  conducted. 

These  MDS  analyses  can  be  compared  with  the  present 
analysis.  To  the  extent  that  the  data  collected  in  this 
research  may  differ  from  other  MDS  analyses  of  subjective 
image  quality,  a  basis  for  comparing  digital  versus  analog 
photographs  presents  itself.  While  a  well-mapped  subjective 
quality  space  may  serve  as  a  device  for  screening  t he 
interpretabi 1 i ty  of  photographs  in  the  future,  this  will  be 
useful  only  if  a  spatial  configuration  can  be  located  t:iat 
is  highly  related  to  performance  (i.e.,  projections  must 
•orrelate  with  performance).  This  cons i dora t i on  has  not 
boon  addressed  in  past  research. 

Rating  Sea  les 

Almost  every  rating  scale  used  by  experimenters  in 
scaling  studies  has  a  different  number  of  categories.  The 
choice  os  to  the  number  of  categories  to  include  in  a  scale 
is  generally  arbitrary,  depending  on  the  resolution  desired 
in  the  obtained  response.  Why  the  existing  NATO  scale  and 
perhaps  other  similar  scales  used  by  Pis  to  rate1  the 
interpretabi 1 i ty  of  photographs  has  10  categories  is 


unknown . 


The  re  ire 


no  dat.i  to  indicate  this  number  to  be 


opt  iron  1  in  rating  ground  resolved  distance  (GRD)  or  image 
qun 1 i ty  in  general. 

Without  justification,  Guilford  ( 1 9 S 4 )  suggested  that 
about  20  categories  are  optimal  in  general  uses  of  rating 
scales.  The  optimal  number  of  categories  is  dependent  on 
the  number  of  stimulus  categories  (and  stinuljs  variation) 
t>  no  scaled  (Erikson  and  Hake,  1955).  The  basic  issue  is 
the  number  of  absolute  judgements  that  can  be  made  in  a 
particular  stimulus  dimension.  In  this  research,  unlike 
basi'’  research  in  auditory  or  visual  perception,  the 
stimulus  property  to  be  scaled  (e.g.,  frequency,  luminance, 
chrominance)  is  not  easily  or  objectively  moasureab! e .  In 
perception  research  it  has  been  shown  that  about  seven 
categories  are  sufficient  for  a  variety  of  stimulus 
dimensions  (Miller,  1956).  However,  Muller,  Sidorsky, 
Slivinske,  Alluisi,  and  Fitts  (19551  have  shown  that  many 
more  categories  (24)  can  bo  used  efficiently  when  the  judges 
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to  practice  the  tick . 

Th  e  r 

efore,  it  would  seem 

reosonab l e 

to  conclude  that  a 

seal  e 

with  more  than  1 0 

c a  teg or i es 

would  better  provide 

the 

response  resolution 

needed  by  a 

we  1 1 -pr ac t i ced  PI. 

For 

these  reasons,  and 

because  the  subjective  scaling  technique  to  be  used  in 
subsequent  studies  in  this  program  should  use  the  same  scale 
(for  comparability  of  results),  careful  attention  was  given 


to  the  scale  selection. 
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A  largo  Dirt  of  this  resoirch  invoNeJ  too  dote  rm  i  m  t  i  on 
of  t.iio  most  efficient  soilin']  procedure,  given  too 
constraints.  Because  of  the  cost  involved  in  generating  t.no 
imagery  and  collentinq  the  data,  considerable  time  an! 
effort  were  spent  in  evaluating  alternative  scaling 
approaches.  It  was  known  during  this  period  of  evaluation 
that  lo  Pis  would  be  available  for  about  four  hours  each. 
It  was  also  known  that  an  imaqery  database  consisting  of  /SO 
images  would  bo  available.  Because  of  the  professional 
attitude  hold  by  most  Pis,  the  experimental  task  would  have 
t>  bo  parsimonious  and,  in  general,  similar  to  the  typical 
practices  of  the  PI.  Consequently,  an  unclassified  NATO 
scale  was  found  that  would  serve  as  a  reference  for  scalinq 
i mage  quality  (Appendix  A).  The  NATO  scale  was  optimal  in 
tha*-  i-  possessed  a  hiqh  degree  of  similarity  to  the  scales 
used  operationally  by  Pis.  It  is  also  presumed  to  relate 
directly  to  ORD.  The  only  remaining  problem,  and  perhaps 
the  most  difficult  to  resolve,  was  to  decide  how  to  use  the 
scale  to  collect  the  most  meaningful  data  without  creating 
an  operationally  meaningless  task  for  the  Pis. 
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THK  M.TEKNATI  VKS 

Two  broad  classes  of  subjective  scaling  net  hods  i-xist: 
direct  and  indirect.  Each  class  encompasses  numer  >»is,  (>u» 
varied,  approaches.  The  major  difference  between  the  two 
classes  i s  the  specificity  of  the  obtained  response 
information.  When  the  response  given  by  the  judge  (or  PI) 
provides  a  direct  quantization  (numerical  representation)  of 
the  subjective  effect  of  the  stimulus,  then  the  procedure 
can  be  said  to  be  direct.  Direct,  methods  of  scaling  can 
result  in  interval  or  ratio  scale  values;  judges  simply 
report  the  scale  values.  Types  of  direct  methods  include 
ratio  estimation,  magnitude  estimation,  fractionation,  and 
cross-modality  matching.  On  the  other  hand,  the  indirect 
methods  of  scaling  result  in  ordinal  scale  values  that  can 
be  transformed  to  an  interval  or  ratio  scale.  The  indirect 
methods  are  typified  by  the  pair  comparison  approach,  where 
ill  of  the  experimental  stimuli  are  paired  and  compared  on 
the  basis  of  some  attribute.  Other  indirect  methods  include 
the  methods  of  triads  and  rank  ordering.  In  most  cases, 
both  direct  and  indirect  methods  will  produce  satisfactory 
scale  values;  however,  experimental  constraints  often 
dictate  the  most  feasible  approach  to  take. 
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THK  MOST  FEASIBLE  ALTERNATIVE 


The  direct  methods  of  scalinq  are  very  easy  to  apply.  A 
seemingly  acceptable  task  entailed  each  PI  making  2S0  ratio 
estimates  in  four  hours.  As  Pis,  in  their  daily  work,  use 
an  internalized  concept  of  GRD  to  draw  inferences  from 
imagery,  the  NATO  scale  seemed  to  be  an  ideal  instrument  to 
use  in  the  study  of  the  subjective  i n te r pr e tab i 1 i ty  of 
images.  The  NATO  scale,  in  conjunction  with  ratio 
estimation,  satisfied  tne  needs  of  this  research  (e.g.,  a 
task  similar  to  that  of  the  Pis'  daily  practices,  a  scaling 
method  that  would  provide  scale  values  for  subsequent  MDS 
analysis,  and  a  method  that  could  be  used  in  light  of  the 
constraints  on  time  and  the  number  of  available  Pis). 

The  indirect  methods,  particularly  pair  comparisons,  are 
perhaps  more  reliable  than  the  direct  methods  because  of  the 
procedure  involved  (relative  judgements),  but  require  a 
large  amount  of  time  in  data  collection.  Though  there  are 
special  pair  comparison  techniques  for  reducing  the  number 
of  pairs  to  be  judged,  with  limitations  on  the  number  of 
judges  available,  pair  comparisons  could  not  be  used  in  this 
study  without  reducing  the  range  of  experimental  parameters 
currently  existing  in  the  imagery  database. 

Consequently,  ratio  estimation  using  the  NATO  scale  as 
the  reference  appeared  to  be  the  most  reliable  and  time 
efficient  means  of  collecting  subjective  data. 


III. 


PURPOSE  OF  THLS  EXPERIMENT 


The  purpose  of  this  experiment  was  to  investigate  the 
ps ychophys i ca 1 1 y  scaled  effects  of  blur  and  noise  on  digital 
image  quality.  To  maintain  an  experimental  environment  in 
which  Pis  are  most  accustomed  to  working,  the  NATO  scale  was 
chosen  as  a  reference  for  rating  image  quality.  Because  of 
the  inclusion  of  the  NATO  scale,  issues  related  to  the  use 
of  scales  were  also  studied. 

The  specific  objectives/hypotheses  of  this  research  are 
as  follows:  (1)  digital  images  degraded  by  blur  and/or 
noise  appear  less  i n te r pr e tabl e  as  the  degree  of  degradation 
increases;  (2)  the  optimal  number  of  categories  in  the  NATO 
scale  is  greater  than  10;  (3)  an  MDS  analysis  can  be  used  to 
nap  the  subjective,  spatial  d  imens  i  ona  1  i  t  y  of  the  imagery 
database  and  predict  information  extraction  performance;  (4) 
the  correlation  between  information  extraction  performance 
and  ratings  of  apparent  i n te r pr e tab i l i t y  is  high  for  trained 
Pis,  and  (5)  the  scaling  technique  used  will  produce  data 
sufficiently  consistent  and  meaningful  such  that  the  same 
scaling  technique  can  be  usesl  in  the  subsequent  soft-copy 
studies  in  this  program. 
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IV.  METHOD 


PHOTO  INTERPRETERS 

The  Pis  used  in  this  experiment  were  14  NATO-scale 
trained  military  Pis  who  were  stationed  at  Hickam  Air  Force 
Base,  Hawaii.  One  of  fifteen  Pis  scheduled  to  participate 
in  the  study  declined  because  of  the  importance  and  urgency 
of  her  regular  work.  The  average  age  of  the  Pis  was  about 
25  years  and  the  average  experience  level  about  four  years. 
The  study  was  conducted  at  Hickam  Air  Force  Base  during 
normal  working  hours.  The  Pis  were  not  paid  extra  for  their 
participation. 

APPARATUS 

A  Perkin-Elmer  microdensitometer  was  used  to  digitize  10 
standard  photographs  consisting  of  three  orders  of  battle: 
air,  electronic,  and  sea  (i.e.,  four  airfield  scenes,  two 
scenes  of  typical  research  and  development  installations, 
and  four  quay  and  shipyard  scenes).  The  images  were 
digitized  in  a  4096  x  4096  picture  element  format.  The 
complete  set  of  digitized  images  was  planned  to  represent 
all  combinations  of  five  levels  of  noise,  10,  20,  40,  80, 
and  160  digital  units  (s igna 1- to-no i se  ratios  of  200,  100, 
50,  25,  and  12.5),  five  levels  of  blur,  20,  40,  80,  160,  and 
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320  micrometers  (».m),  and  10  scenes.  Blur  was  produced  in 
software  by  multiplying  the  frequency  spectrum  of  each 
digitized  image  by  an  appropriate  Gaussian  filter  function. 
A  Fast  Fourier  Transform  yielded  the  frequency  spectra.  The 
image  matrices  were  then  reconstructed  by  the  Inverse 
Fourier  transform.  As  blur  was  added  to  each  image,  high 
frequency  detail  was  removed.  Noise  was  added  to  the  images 
by  multiplying  each  picture  element  by  a  value  randomly 
selected  from  an  appropriate  Gaussian  random  noise  function. 
The  scenes  and  amounts  of  degradation  were  scrutinized  and, 
in  part,  chosen  by  a  senior  PI  who  provided  the  scores  for 
the  information  extraction  study.  For  a  more  thorough 
description  of  the  imagery  database  and  its  development,  see 
Burke  and  Strickland  (1982).  For  reasons  given  in  that 
report,  the  final  hardcopy  SNR  levels  were  75,  60,  42,  24, 
and  12,  while  the  final  blur  levels  were  40,  52,  84,  162, 
and  322  micrometers. 

The  250  images  were  shown  to  the  Pis  in  the  form  of 
positive  transparencies,  7.6  x  7.6  cm.  A  light  table 
(Richards  Model  33H100)  with  binocular  zoom  stereo  optics 
and  hand  held  tube  magnifiers  was  available  to  all  Pis 
during  data  collection. 

As  mentioned,  the  existing  NATO  scale  that  was  used  to 
scale  the  transparencies  is  based  on  interpretabil i ty  and 
GRD.  Each  increasing  whole  number,  0  to  9,  represents  a  50% 
reduction  of  GRD,  beginning  with  "useless  for 
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interpretation" ,  scale  score  0,  qreater  than  9  ,m,  scale 
score  1,  and  ending  with  less  than  10  cm  for  a  scale  score 
of  9.  (See  Appendix  A  for  the  scale.)  In  this  experiment, 
the  Pis  were  asked  to  report  not  only  a  whole  number  scale 
value  but  also  a  decimal.  In  other  words,  because  each 
whole  number  represents  a  range  of  interpr etabi 1 i ty  and 
MGRD,  the  Pis  had  to  interpolate  to  the  nearest  0.1  over  the 
range  to  report  the  decimal  portion  of  their  scale  value. 
Thus,  the  existing  10-point  NATO  scale  was  transformed  to  a 
100-point  scale. 


PROCEDURE 

Each  PI  was  allowed  approximately  four  hours  to  scale  all 

250  transparencies.  Rest  periods  were  allowed  as  needed. 

The  250  transparencies  were  administered  to  each  PI  in  a 

different  random  order.  The  randomization  scheme  was 

obtained  using  the  Statistical  Analysis  System  (SAS)  Plan 

Procedure  (Barr,  Goodnight,  Sail,  and  Helwig,  1976).  The 

scale  values  were  reported  verbally  by  the  PI  and  recorded 

manually  by  the  experimenter.  The  instructions  were 

administered  to  each  PI  as  follows: 

This  experiment  will  involve  your  rating  numerous 
transparencies  on  the  basis  of  interpretabi 1 i ty . 

It  is  assumed  that  you  have  had  some  experience 
with  rating  scales;  if  you  have  not,  you  should 
inform  the  experimenter  at  this  time. 

The  rating  scale  that  will  be  used  here  is  a 
0-9  imagery  i n ter pr e tabi 1 i ty  rating  scale.  This 
scale  is  probably  similar  to  those  you  are 
familiar  with.  Please  look  over  the  scale 

(attached)  at  this  time.  As  you  can  see,  larger 
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scale  values  represent  a  greater  interpretabi 1 i ty . 

The  range  of  some  interpretation  capabilities  that 
fall  into  that  range  are  given  under  each  Rating 
Category,  0-9.  Your  task  will  be  to  rate  each 
transparency  using  this  scale. 

However,  in  an  attempt  to  obtain  more 

information  from  your  ratings,  we  would  like  you 

to  report  your  ratings  to  the  nearest  tenth.  That 
is,  your  ratings  should  take  the  form  3.9,  7.2,  or 

5.0,  rather  than  simply  0,  1,  2,  3,  4,  5,  5,  7,  8, 

or  9.  It  may  be  a  good  strategy  to  determine  the 
range  in  which  a  particular  transparency  falls 
(say  between  6  and  7)  and  then  try  to  refine  your 
rating.  Please  remember  to  be  explicit  in 
reporting  both  portions  of  your  rating,  a  whole 
number  and  a  decimal.  We  hope  to  show  that  Pis 
can  rate  transparenc ies  with  more  resolution  than 
conventional  scales  afford.  If  you  have  any 
questions  regarding  the  scale  or  how  we  would  like 
you  to  use  it,  please  ask  the  experimenter  at  this 
time . 

In  this  phase  of  the  experiment,  you  will  see 
250  transparencies.  The  250  transparencies 

represent  10  scenes  that  have  differing  amounts  of 
noise  and  blur.  Your  task  is  to  rate  each 
transparency  using  the  scale  we  have  just 
discussed.  The  scale  will  be  available  for  use 
during  the  experiment,  and  you  may  use  any 
equipment  you  feel  would  be  helpful.  However,  you 
will  only  have  about  4  hours  to  rate  the  entire 
set  of  250  transparencies;  therefore,  you  should 
spend  about  1  minute  on  each.  The  experimenter 
will  be  in  the  room  with  you  while  you  rate  the 
transparencies.  He  will  hand  you  each 

transpa rency ,  and  after  you  arrive  at  a  rating, 
please  hand  back  the  transparency  and  verbally 
report  your  rating. 

Before  we  begin,  do  you  have  any  questions 
regarding  this  phase  of  the  experiment? 

The  instructions  given  to  and  the  procedure  carried  out  by 

the  experimenter  were  consistent  with  the  above 

instructions. 

Prior  to  participation  in  this  experiment,  each  PI  was 
asked  to  read  and  sign  an  informed  consent  form  to  insure 
that  the  rights  of  the  participant  were  known  and  upheld. 
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V.  RESULTS 


ANALYSIS  OF  VARIANCE  (ANOVA)  ON  RATINGS 

The  overall  ANOVA  of  the  scaling  data  includes  the  three 
parameters  of  the  imagery  database,  noise,  blur,  and  scenes, 
in  addition  to  order  of  battle  and  PI.  As  indicated  earlier, 
the  imagery  database  consists  of  scenes  that  were  nested  in 
various  orders  of  battle.  The  summary  table  for  the  overall 
ANOVA  indicates  that  almost  every  source  of  variation, 
including  interactions,  was  significant  (Table  1). 

Another  factor  was  separately  analyzed  but  does  not 
appear  in  the  summary  table.  Two  of  the  fourteen  Pis  who 
participated  in  the  study  were  actually  Army  Pis  while  the 
other  twelve  were  in  the  Air  Force.  Therefore,  the  main 
effect  of  Service  and  the  Service  x  Scene  interaction  were 
analyzed.  Both  effects  were  non-significant  (F^^  *  .26,  p 

=  .6134  and  ^7,43  =  1.93,  p  =  0.75,  respectively)  and 
deleted  from  the  overall  analysis. 
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TABLE  1 


Summary  of  Analysis  of  Variance  on  NATO  Scale  Scores 


SOURCE 

df 

MS 

F 

P 

Blur  (B) 

4 

832.01 

105.63 

<.0001 

Noise  (N) 

4 

104. 20 

82. 59 

< .0001 

Order  of  Battle  (OB) 

2 

57.  36 

16.01 

<  .0001 

Photointerpreter  (PI) 

1  3 

259.24 

Scene/OB  (S/OB) 

7 

28.80 

17.49 

< .0001 

B  x  N 

16 

5.  96 

14.06 

< .0001 

B  x  S/OB 

28 

2.  30 

5.14 

<.0001 

B  x  PI 

52 

7.88 

B  x  OB 

8 

5.40 

8.03 

<  .0001 

N  x  S/OB 

28 

.  57 

1 .40 

.  0882 

PIx  N 

52 

1 .  26 

N  x  OB 

8 

1.41 

3.09 

.  0036 

PIx  S/OB 

9 1 

1.65 

PIX  OB 

26 

3.  39 

B  x  N  x  S/OB 

112 

.42 

1.  26 

.  0378 

B  x  N  x  PI 

208 

.42 

B  x  N  x  OB 

32 

.41 

1.09 

.  3349 

B  x  PIx  S/OB 

364 

.45 

B  x  PIx  OB 

104 

.67 

PIx  N  x  S/OB 

364 

.41 

PIx  N  x  OB 

104 

.46 

B  x  PIx  N  x  S/OB 

1456 

.  34 

B  x  PIx  N  x  OB 

415 

.  37 

Total 

(3499) 

B1  ur 

Increasing  blur  results  in  decreasing  scale  values,  as 
plotted  in  Figure  2.  A  Newman-Keuls  multiple  comparisons 
test  (MCT)  showed  that  all  comparisons  were  significant  (p  < 
.05)  except  between  blurs  of  40  and  52  m  and  between  52  and 
84  ,.m.  The  trend  was  mono  ton  i  ca  1 1  y  decreasing  with 
increasing  blur  degradation.  The  linearity  of  this  erfect 
is  indicated  by  a  linear  correlation  of  r  =  .975,  p  <  .nooi. 
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The  main  effect  of  Noise  was  also  highly  significant,  as 
shown  in  Figure  3.  Increases  in  s  i  gna  1  -  to-n.c  i  se  ratio 
generally  increased  scale  values.  The  MCT  showed  that  all 
comparisons  were  significant  except  between  signal -to-no ise 
ratios  75  and  50.  This  trend  was  also  mono  ton i ca 1 1 y 
decreasing  and  highly  linear  with  degradation,  r  -  .945,  p  < 
.0001. 

Order  of  Battle 

The  main  effect  of  Order  of  Battle  (OB)  is  illustrated  in 
Figure  4.  The  MCT  showed  that  both  air  and  sea  orders  of 
battle  were  rated  more  i n te r pr e tabl e  than  electronic  scenes 
(p  <  .05),  although  air  and  sea  OBs  did  not  differ  from  one 
another  ( p  >  .05). 
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Fi  jure  5:  The  effect  of  Scene  on  NATO  scale  value 

Blur  :<  Noise 

The  31ur  x  Noise  interaction  is  shown  in  Figure  (i .  In 
general,  the  s i g na 1 - to- no i se  effect  is  reduced  as  blur 
increases.  With  reduced  blur,  the  decrease  in  NATO  scale 
value  with  decreasing  s ig na 1  - to-no i se  ratio  becomes  more 
pronounced.  Simple  F  ratios  were  calculated  for  the  Noise 
effect  at  each  Blur  level.  All  of  these  Noise  simple 
effects  were  highly  significant,  p  <  .0001,  with  the 
exception  of  the  322  m  blur,  p  =  .  00 3.  Subsequent  MCT.s 


showed  rhit  fewer  and  fewer  No  i  ne  levels  differed  Iron  one 


mother  witli  i  no  r  -?  a s  i  nq  blur.  Appendix  D  'jives  the  details 
of  the  ‘■1C  I".  tr  each  level  of  blur. 


PEAK  SIGNAL  /  RMS  NOISE 

Ki  jure  h :  The  effect  of  Hi  ir  •<  Noise  on  NAT)  scale  v  a  1  10 


bH  x  Blur. 

The  interaction  between  OB  mi  Blur  i~  s .h o wn  in  Fi  jure  7. 

Tenoral  ty,  t  he  effect  of  Blur  was  sn:  i  ir  for  all  OBk  , 

although  a  few  crossover  lata  paints  exist  to  cause 

statistical  significance  of  the  interaction.  simple  F 

ratios  we r ?  calculated  for  OB  it  each  Blur  level.  The  Cb 
simple  effect  was  significant  at  all  levels  of  B1  ir  except 
it  a  blur  of  40  .m.  A  bl  ir  of  120  of  12.:  n  produced 
differences  anonj  all  OBs ,  while  at  the  intermediate  levels 
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Figure  7:  The  effect  of  Blur  x  Order  of  Battle  on  “<M'l 
s  c  a  1  o  v  a  1  j  e 

"B  •<  No  i  se 

Toe  interaction  between  OB  and  Noise  is  shown  in  Figure 
3.  The  difference  in  scale  values  between  air  and  sea  OBs 
becomes  greater  with  decreasing  s igna 1 - to-no i se  ratios. 
Simple  effect  F-ratios  calculated  at  each  Noise  level  were 
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PEAK  SIGNAL /RMS  NOISE 

Figure  S:  The  effect  of  Noise  x  Order  of  Battle  on  NATO 
scale  value 


Scene  :<  Rlur 

The  interaction  between  Scene  and  Blur  was  statistically 
sijnificant,  Figure  9,  but  is  difficult  to  interpret 
logically.  Simple  effect  F-ratios  were  calculated  for 
Scenes  at  each  Blur  level.  All  Scene  simple  effects  were 
highly  sijnificant,  p  <  .0001.  M.CTs  showed  various 

differences  among  Scenes  at  each  Blur  level.  While  the 
ordered  Scene  means  at  each  Blur  level  were  similar  to  the 
order  shown  by  the  overall  Scone  main  effect,  each  Blur 
level  produced  unique  differences  among  Scenes.  Appendices 
D-ll  depict  the  Scene  differences  for  Blur  levels  40  to  322 
n,  respectively.  MCTs  were  conducted  but  provide  little  in 
the  way  of  c la r i f ica t ion  of  the  interaction;  apparently, 
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some  Scenes  were  effected  more  by  Blur  than  were  other 


Scenes . 


BLUR./xm 

Figure  9:  The  effect  of  Blur  x  Scene  on  NATO  scale  value 


Scene  x  Noise 

The  Scene  x  Noise  interaction  did  not  reach  statistical 
significance.  The  means  were  plotted  and  appear  in  Figure 
10.  Obviously,  the  effect  of  noise  was  fairly  constant 
across  all  Scenes. 

Scene  x  Blur  x  Noise 

The  Scene  x  Blur  x  Noise  interaction  was  statistically 
significant,  and  is  plotted  in  Figures  11-20.  Each  figure 
represents  the  Blur  x  Noise  interaction  for  a  different 
Scene:  Figure  11,  Scene  1,  Figure  12,  Scene  2,  Figure  13, 


MEAN  NATO  SCALE  VALUE 


MEAN  NATO  SCALE  VALUE 


Figure 


PEAK  SIGNAL /RMS  NOISE 

15:  The  effect  of  Slur  x  Noise  on  NATO  scale  value 

Scene  5 


PEAK  SIGNAL  /  RMS  NOISE 

The  effect  of  Blur  x  Noise  on  NATO  scale  value 
Scene  6 


Figure  16: 


OPTIMAL  NUMBER  OF  RESPONSE  CATEGORIES 

When  it  was  decided  that  the  NATO  scale  would  be  used  in 
this  research,  the  only  concern  was  that  the  original  scale 
might  have  too  few  response  categories  (10)  to  test 
thoroughly  the  subjective  resolution  of  the  Pis.  Therefore, 
as  noted  earlier,  the  normal  use  of  the  scale  was  modified 
to  accommodate  100  response  categories.  Though  100 

categories  were  believed  to  be  more  than  necessary,  it  was 
felt  that  the  Pis  could  use  the  whole  numbe r/dec ima 1  scale 
with  less  difficulty  than  any  other  interpolation  scheme. 

The  Shannon-Wiener  measure  of  information  (see  Equation  i) 

n 

H  =  p.  lop,.,  p.  i  1  > 

i  =  1 

was  used  to  calculate  the  average  informational  value  of  the 
100  scale  alternatives  (Attneave,  1959).  Eighty-eight  of 
the  100  possible  categories  were  used  by  the  Pis.  The 
Shannon-Wi ener  formula  indicated  that  5.95  bits  (average)  of 
information  were  required  to  arrive  at  each  scaling 
decision.  Therefore,  about  G 2  (2  )  categories  would 

theoretically  suffice  in  scaling  this  imagery  database.  The 
100-point  scale,  although  larger  than  necessary  and 
different  from  the  standard  10-point  scale,  appeared  to  bo 
easily  understood  and  highly  functional  in  scaling 
interpret ability. 
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mps  analyse; 


Unfortunately,  the  Bell  Laboratories  KYST2A  program 
purchased  for  the  MDS  analysis  was  not  capable  of  analyzing 
all  of  the  data  at  one  time.  In  fact,  a  maximum  of  BO 
images  had  to  be  selected  from  the  imagery  database  (250 
images)  to  comply  with  the  limitations  of  the  routine.  The 
selection  was  done  as  follows.  Because  Blur  lev-els  of  40 
and  52  m  and  S/N  levels  of  75  and  AOdB  did  not  differ  from 
one  another,  these  lowest  levels  of  degradation  were 
discarded.  While  10  Scenes  in  the  imagery  database  were 
particularly  responsible  for  its  size,  three  of  the  more 
consistently  scaled  Scenes,  one  from  each  OB,  were  chosen. 
Scale  value  data  for  those  Scenes  (5,  8,  and  10)  are  plotted 
in  Figures  15,  18,  and  20,  respectively.  Thus,  48  images 
were  submitted  to  the  MDS  analysis,  3  Scenes  x  4  Blur  levels 
x  4  Noise  levels.  A  dissimilarity  matrix  was  calculated  for 
an  input  to  MDS  by  taking  the  absolute  difference  between 
each  pair  of  the  48  images;  the  actual  input  to  the  MDS 
analysis  was  a  triangular  matrix  without  diagonal  elements, 
or  4  8(47)/2  =  1  1  28  data  points. 

Several  runs  of  the  Bell  Laboratories  routine  were  made 
before  the  best  model  was  selected.  The  best  model  is  shown 
i  a  Equation  2a.  The  general  form  of  the  MDS  model  is  given 
in  Equation  2b. 


The  con f i q  ur  a  t  ion 
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relationship  between  stress  (inversely  related  to  tin*  fit) 
ini  tv  number  of  dimensions  in  the  model.  Because  stress 
is  minimised  by  a  pood  fit  (see  Equation  1  for  the  stress 
f onei  at  ,  Ft  jure  21  indicates  that  the  model  with  five 
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NUMBER  OF  DIMENSIONS 

Fi  iur  ?  21:  The  r * » 1  ^ t  i  ansh  i  p  bet  w.-en  stress  and  tne  number 
o  f  i  iTi'.'ns  i  -ms 


The  MDS  projections,  ill  five  dimensions,  end  mean  MATO 
scale  values  were  used  to  predict  the  previously  obtained 
information  extraction  performance  in  a  multiple  regression 
equation.  Of  the  4  8  images  analyzed  by  the  M D.I  program, 
'inly  2  4  were  a  pul  i  'able  because  Slur  levels  of  52  and  152  n 
wore  nut  exploiter)  by  the  Pis.  Of  course,  all  24  mean  scale 
values,  averaged  across  judges,  were  available.  Multiple 
regression  is  tantamount  to  performing  i  multiple 
correlation,  except  that  a  slope  or  weighting  factor  is 


provided  for  each  predictor  that  allows  far  an  evaluation  of 


the  correlation,  r  -  .70,  n  -  .  00  3.  (.Statistically,  this 
correlation  was  not  a  significant  improvement  over  .47  or 
.52,  z  =  1.22,  o  =  .110  anh  z  =  .90,  p  -  .170, 
r  espec  t i ve ly . ) 

Finally,  overall  mean  performance  scores  (averaged  over 
scenes  »nd  Pis)  were  correlated  with  corresponding  mean 
s c a  1 o  scores  for  each  of  the  15  ft 1  or  and  Noise  combinations. 
The  correlation  was  improved  and  significant,  r  =  .893,  p  = 
. 0001.  (This  correlation  was  a  statistical  improvement 
above  .47,  .52,  an  1  .70,  z  -  3.22;  p  =  .  0007,  z  =  2.93;  p  = 
.001",  and  z  =  2.11;  p  =  .017,  respectively.) 
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DISCUSSION 


I 

VI  . 


The  NATO  scale  and  its  usage  here  presented  no 
far  the  Pis.  In  general ,  both  the  imagery  databas 
scaling  task  ^ra  well  received  by  all  of  the  Pis 


problems 


and  t  h  e 


Wh  i  1  e 


Pis,  by  reputation,  are  very  scrupulous  of  plan  tog  r  a  oh  i  a 
■Utility  and,  for  security  reasons,  sensitive  about  giving 
insight  into  their  work  procedure,  tills  study  was  not 
hindered  by  the  use  of  those  highly  trained  j udges--exeept 
f>r  the  one  case  of  attrition  which  was  due  to  workload. 


NO!  IK  AND  ’JR  EFFECTS 

Toe  overall  statistical  analysis  confirmed  the  first 
h/oothosis.  Doth  noise  and  blur  degradations  reduced  the 
gudged  i  n  to  r  or e tab  i  1  i  ty  of  the  images.  In  general,  tiie 
jr.-.it'r  trio  physical  image  degradation,  the  greater  was  the 
reduction  in  i n te rpr e tab i 1 i ty .  However,  for  both  noise  and 
b!  ir,  them  were  no  differences  in  i  n  te  r  pr  e  tab  i  1  i  ty  between 
the  two  lowest  levels  of  degradation.  One  or  both  of  two 
conditions  nay  have  been  responsible  for  this  result. 
First,  images  assumed  to  be  free  of  noise  had  somewhat 
higher  base  levels  of  noise  even  before  any  degradation  was 
•fleeted.  The  second  possible  explanation  regards  the  Pis’ 
sensitivity  to  noise  and  blur.  If  the  true  relationship 
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between  noise  or  blur  and  i n te r pr e tab i 1 i t y  is  sigmoid¬ 
shaped,  then  judges  are  ps ychophys i ca 1 1 y  insensitive  to 
small  changes  in  noise  or  blur  level  when  the  extent  of 
degradation  is  either  very  slight  or  very  great.  Stevens 
(1975,  p.  135)  has  suggested  that  when  the  full  range  of  a 
stimulus  parameter  is  used  experimentally,  the 
psychophysical  function  tends  to  be  sigmoid-shaped. 

The  combined  effect  of  blur  and  noise  indicated  that 
these  degradations  a  r.  e  interactive.  The  MCT  for  the  Blur  x 
Noise  interaction  indicated  that  noise  has  little  or  no 
effect  on  a  highly  blurred  image.  But,  blur  did  have  an 
effect  on  a  very  "noisy"  image.  It  appears,  then,  that  Pis 
react  differently  to  noise  degradation  than  they  do  to  blur. 
Subjective  comments  by  Pis  substantiate  this  notion;  it  is 
common  for  Pis  to  state  that  they  can  "look  through"  noise 
but  cannot  disregard  blur. 

NUMBER  OF  RESPONSE  CATEGORIES 

The  determination  of  the  optimal  number  of  response 
categories  took  an  absolute  judgement  or  information 
processing  approach.  Given  that  the  information  processing 
analysis  is  so  easy  to  perform,  future  research  might 
attempt  to  determine  the  conditions  under  which  the 
technique  and  these  results  are  valid.  It  should  be  kept  in 
mind  that  the  optimal  number  of  response  categories  is 
affected  by  numerous  factors,  such  as  the  p.sychophys  i  cal 


r 
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range  and  spacing  of  the  stimuli  (Alluisi,  1917).  This 
imagery  database  was  previewed  by  senior  Pis  and  deemed 
representative  of  the  range  of  quality  of  imagery  actually 
interpreted  by  Pis  in  practice.  This  analysis  showed  that 
about  6  2  categories  would  be  required  to  allow  .adequately 
for  the  subjective  quality  differences  in  the  present 
imagery  database.  Other  imagery  databases  might  well 
require  more  than  or  less  than  62  categories.  However, 
because  this  100-point  scale,  while  larger  than  needed,  was 
easy  to  use  by  the  Pis,  it  is  strongly  recommended  as  a 
replacement  for  the  current  10-point  scale  NATO  scale. 


MDS  ANALYSIS 

MDS  analysis  fits  the  data  to  a  multidimensional  spatial 
conf iguration  and  calculates  a  projection  on  each  dimension 
for  each  cell  in  the  experimental  design.  While  MDS 
analysis  is  simply  a  mathematical  curve  fitting  procedure, 
similar  to  factor  analysis,  difficulty  arises  in  the 
interpretation  of  the  dimensions  of  the  preferred 
configuration.  It  is  only  through  the  repeated  replication 
of  an  experiment  that  an  investigator  can,  with  some  degree 
of  confidence,  begin  to  understand  the  meaning  of  the  MDS 
dimensions.  In  light  of  the  numerous  significant  effects 
reported  in  the  overall  ANOVA,  it  is  beyond  the  present  data 
to  interpret  the  five  dimensions  of  the  reported 


conf iguration  . 


It  does  appear  that  at  least  three  of  the 


dimensions  map  well  onto  known  database  variables  o£  noise, 
scene  (or  OB),  and  blur.  Therefore,  the  MDS  analysis,  while 


not  conclusive  in  its  defined  dimensions,  appears  to  be 
consistent  with  known  facts  about  the  imagery  database. 

Minimum  stress  was  achieved.  However,  because  the  MDS 
routine  is  limited  to  six  dimensions  in  fitting  the  data, 
there  is  no  assurance  that  the  best  model  did  not  represent 
a  local  stress  minimum  rather  than  a  global  minimum.  While 
the  projections  are  listed  in  Appendix  I,  the  usefulness  of 
this  subjective  response  configuration  remains  uncertain. 

The  regression  equation  utilizing  the  projections  to 
predict  performance  failed  to  add  any  mea n ing f ul ness  to  the 
MDS  analysis.  The  ultimate  goal  of  either  scale  values  or 
physical  metrics  is  the  prediction  of  performance.  Future 
MDS  analyses  in  this  area  of  research  should  concentrate  on 
validating  MDS  projections  in  terms  of  performance.  Perhaps 
scale  scores  that  correlate  better  with  performance  than 
these  did  will  result  in  projections  that  better  predict 
performance.  It  is  possible  that  the  predictive  value  of 
the  MDS  projections  would  have  been  enhanced  by  a  complete 
set  of  performance  data  to  accommodate  the  48  images 
selected  for  MDS. 

In  regard  to  past  research  and  MDS  analysis,  it  is 
interesting  that  this  MDS  analysis  showed  that  a  similar 
number  of  dimensions  were  required  in  the  spatial 
conf iguration .  Also,  the  i n te r pr e ta t i on  of  the  dimensions 
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was  quite  similar  to  past  research.  Blur  is  doubtlessly 
related  to  sharpness.  Considering  the  data  of  Marmolin  and 
Nyberg  (1978),  these  data  support  the  contention  that  the 
Euclidean  model,  r  =  2.0,  is  more  appropriate  for  MDS 
analyses  of  subjective  image  quality. 

RELATIONSHIP  BETWEEN  INFORMATION  EXTRACTION  AND  SCALING 

In  light  of  the  correlations  reported  by  other  authors, 
the  correlation  (for  individual  Pis  and  Scenes)  obtained 
here  between  information  extraction  performance  and  NATO 
scale  values  was  disappointing  though  statistically 
significant.  The  most  cogent  explanation  for  the  low 
correlation,  and  the  failure  of  the  regression  analysis 
discussed  above,  is  the  difference  between  the  experimental 
designs  in  the  scaling  and  information  extraction  studies. 
The  scaling  study  was  a  completely  factorial  design, 
whereas,  in  the  exploitation  study,  Scenes  were  confounded 
with  Noise  for  each  PI,  and  E  ur  was  treated  as  a  between- 
subjects  variable.  Wi thin-subject  factors  have  been  shown 
to  be  more  sensitive  in  demonstrating  nain  effects  (Grice 
and  Hunter,  19M).  Blur,  the  more  severe  degradation,  was 
treated  as  a  between-sub ject  variable  because  of 
experimental  constraints  and  because  only  one  of  the 
degradations  could  be  treated  within  subjects  in  that 
experiment.  Un f or  tuna  tel y ,  the  overall  ANOVA  for  the 
information  extraction  experiment  failed  to  find  a 
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significant  main  effect  of  Blur.  Noise,  the  less  impactful 
parameter  in  this  study  as  shown  by  the  Blur  x  Noise 
interaction  (see  Figure  6),  was  significant  in  the 
information  extraction  study.  This  difference  in 
experimental  designs  doubtlessly  had  an  effect  on  the 
association  between  scale  values  and  information  extraction 
pe  r  fo  rmance . 

On  the  other  hand,  averaging  across  Scenes  and  Pis  to 
obtain  overall  sample  means  for  the  5  Noise  x  1  Blur 
combinations  yielded  a  very  satisfactory  correlation  of  r  = 
.898  between  NATO  scale  value  and  information  extraction 
performance.  Thus,  while  individual  scene/PI  combinations 
cannot  be  predicted  very  accurately,  overall  group 
performance  can.  And,  after  all,  the  main  objective  in  most 
image  interpretation  research  is  to  predict  the  performance 
of  the  typical,  not  the  individual,  PI. 

DIGITAL  IMAGERY  SIMILARITIES 

A  final  point  should  be  made  concerning  a  comparison 
between  analog  and  digital  modes  of  imagery  presentation. 
From  the  noise  studies  reviewed,  recall  that  the  middle 
subjective  scale  value  corresponded  to  a  s ig nal -to-no  i  se 
ratio  of  about  30  dB  (or  32:1).  From  an  inspection  of 
Figure  3,  it  can  be  seen  that  a  scale  value  of  five 
(midpoint  of  NATO  scale)  corresponds  to  a  s ig na 1  - to-no i se 
ratio  of  42:1,  or  about  32  dB. 
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The  average  signal -to-echo  ratio  at  the  middle  scale 
value  in  the  studies  reviewed  was  20  dB  (or  10:1).  In 
Figure  2,  the  NATO  middle  scale  value  of  five  corresponds  to 


115  .m  of  blur 

or  a 

signal-to-echo  ratio 

o  f 

about 

17:1 

(signal  =  2000) 

or  25 

dB.  For  both  noise 

and 

blur 

,  the 

obtained  ratios. 

"signal-to-degradation"  , 

a  t 

the 

middle 

scale  value  were 

about 

the  same  as  those 

r  epo  r  ted 

i n  the 

analog  imagery  literature. 

On  the  basis  of  this  comparison,  digital  images  would 
appear  to  be  very  similar  to  analog  images  in  terms  of 
subjective  quality  or  interpre tabi 1 i ty .  However,  due  to 
myriad  differences  in  the  experimental  procedures,  past  and 
present  (e.g.,  electronic  versus  photographic  presentation, 
dynamic  versus  static  noise),  these  conclusions  can  only  be 
viewed  as  cursory  pending  further  research. 
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VII.  CONCLUSIONS 


The  main  effects  of  Blur  and  Noise  and  the  Blur  x  Noise 
interaction  showed  that  digital  images..  like  analog  images, 
are  poorer  in  subjective  quality  as  the  degree  of  Blur  or 
Noise  increases.  The  Blur  x  Noise  interaction  indicated 
that  Blur  was  the  more  serious  degradation  at  the  levels 
i nvest iga  ted . 

Scene  content  currently  unfamiliar  to  Pis  does  not  appear 
less  interpretable  than  scenes  commonly  exploited  by  the  Air 
Force.  In  general,  Pis  seem  to  evaluate  the  information 
available  in  an  image  in  an  objective  manner,  regardless  of  4 

practice  or  bias. 

The  optimal  number  of  response  categories  for  an  image 
interpretabil i ty  scale  is  greater  than  10.  Because  well- 
practiced  Pis  are  so  attuned  to  resolution  differences  in 
photographic  imagery,  as  many  as  62  categories  may  be  needed 
to  accurately  judge  i nte r pr e tab i 1 i ty .  The  scale  of  100 
points  used  in  this  study  provided  useful  information. 

MDS  can  be  used  to  represent  the  subjective 
dimensionality  of  a  large  imagery  database.  The  resulting 
spatial  configuration  can  then  be  used  to  determine  the 
physical  parameters  of  the  imagery  that  underly  the 

perception  of  i n ter pr e tab i 1 i ty .  However,  the  attempt  to  j 
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predict  information  extraction  performance  from  YDS 
projections  of  subjective  data  failed.  If  MDS  analysis  is 
to  be  used  as  a  predictive  response  surface,  the  projections 
must  be  related,  in  a  meaningful  way,  to  performance. 

Finally,  mean  scale  scores  correlated  quite  well  wi  to 
information  extraction  performance  in  spite  of  differences 
in  experimental  design.  The  correlation  proved  best  wnen 
variance  due  to  scene  content  and  Pis  was  eliminated  by 
averaging  the  data. 


I 


i 


REFERKNC  KS 


All  noth,  .7.  W.  inJ  Prosser  ,  R.  D.  Sab  ;e  -  t  •'  ve  i  ♦  y  «-•  f 

television  pictures  impaired  by  1  ong  -delayed  -r.oo.-; . 

Proce ed  j  ngs  of  the  T_EE  ,  196  5,  1_1  2  ,  4  8  7 - 4  9 f: . 

Mini  si,  E.  A.  Conditions  a  f  f  ec  t i  ng  the  ..t .  >  f 

information  in  absolute  judgements.  .’v-vc:,  :  :  c  2 1 

Review,  1 9  r>  7 ,  ^4  ,  97-1  0  3 . 

Attno.ave,  F.  App  i  i  co  t  i  ons  of  i  n  f  o  nna_t_i_o  n  t  h>>  >r\  to 

psychology  :  A  sumna  r  y  of  bo  s_i  c  concept:,  ; ad 

results.  New  York:  Holt,  Rinehart",  ana  Ices':  in,  19 5  9. 

Harr,  A.  7.,  O  adnight ,  J.  H.  ,  Halt,  J.  P.  ,  and  He  1  w.  3 ,  J. 

?.  A  usp_t_  _  ij_ui_no  to  .HAS  70.  Raleigh,  SA-'  Institute 

Inc.,  1976. 

He  low,  F.  ,  Htier  tas-Sendra ,  F.  ,  Fritza,  K.  ,  and  Sa  .r  ac,  F . 

The  .subject  i  ve  di. star  bin  j  effect  of  no:  se  .  ..  lev  in  ion 
pictures.  E_BU  R e v i e w ,  1963,  ~ i_,  49. 

Brainard,  R .  W .  ,  Sadacca  ,  R.  ,  Lop-  :,  L .  W .  ,  ?. n  i  Jr  n stein, 

N.  Qevel  > pm e n t  and  evaluation  of  a  catalog  technique  for 
measuring  image  quality.  Washington,  D.C. :  U . S .  Army 
Personnel  Research  Office,  Technical  Research  Report 
1150,  AD  645-644,  August,  1956. 

Burke,  J.  J.  and  Snyder,  H.  L.  Quality  merries  of 
digitally  derived  imagery  and  their  relation  to 
interpreter  performance:  II.  intorpretahl 1 ; tv  and  judged 
quality  f>  f  hard  copy  imagery.  Paper  prose’'*  .,}  at  the 
Society  of  Pho  tog .  aph  i  c  Scientists  and  Engineers  Meeting, 
Tucson,  Arizona,  January,  1981. 

Burke,  J .  j.  and  Strickland,  R.  N.  Quality  metrics  of 
digitally  derived  imagery  and  their  relation  "  ' 
interpreter  performance:  I.  Preparation  of  a  large-scale 
database.  Technical  Report  HFL  81-1,  Virginia  Polytechnic 
Institute  and  State  University,  Blacksburg,  VA,  1982. 

Cavanaugh,  ,?.  R .  and  Lessman,  A.  M.  Subjective  effects  of 
differential  gain  and  differential  phase  1  irt>ri  ions  in 
NTSC  color  television  pictures.  Journa  l  oi  r.he  SMPTE , 
1971,  80,  614-619. 


56 


Erikson,  C.  W.  and  Hake,  H.  W.  Mul  t  i  d  iiru-ns  i  ana  1  ?;r  lm.,  1  us 
differences  and  accuracy  of  discrimination.  Journal  of 
Expe r  i men ta  1  Psychology  ,  19  5  5,  50,  153-160. 

Geddes,  W.  K.  E.  The  relative  impairment  produced  by  random 
noise  in  405-line  and  625-line  television  pictures.  EB'J 
Review ,  1963,  78,  46. 

Grice,  G.  R.  and  Hunter,  J.  J.  Stimulus  .nte:  r  .ty  -•  i  i  t  s 
depend  on  the  type  of  experimental  design.  Py sc ho  log ica 
Review,  1964,  71,  247-256. 

Guilford,  J.  P.  Psychome trie  methods.  New  Yu  r  *  :  VcGraw- 
Hill,  1954.  . 

Humes,  J.  M.  and  Bauerschmidt,  D.  K.  bow  1  i  g  h  *_  level  TV 
viewfinder  simulation  program:  Phase  B.  Wr  igh t-Pa c terso 
Air  Force  Base,  Ohio:  Aerospace  Medical  Research 
Laboratory,  Technical  Report  AFAI.-TR-68-l.71,  November, 
1968. 

Klingberg,  C.  L. ,  Eiworth,  C.  L.  ,  and  Fi 1  lea  j ,  C.  K .  Image 
quality  and  detection  performance  of  military 
interpreters.  The  Boeing  Company,  Final  Report  of 
USAFOSR  Contract  No.  F44620-69-C-0128,  April,  1 9 ^ 0 . 

Lessman,  A.  M.  The  subjective  effects  of  echoes  in  525-1  in 
monochrome  and  NTSC  color  television  and  :ne  resulting 
echo  time  weighting.  Journal  of  the  SMPjpE,  1972,  8j  , 
907-916. 

Marmolin,  H.  and  Nyberg,  S.  Multidimensional  scaling  of 

subjective  image  quality.  Forsvarets  f o r skn i ng sans ta 1 t , 
Huvudenhet  3,  FOA  Report  no.  C  30039-H9,  Stockholm, 
November,  1978. 

Miller,  G.  A.  The  magic  number  seven,  plus  or  minus  two: 
some  limits  on  our  capacity  for  processing  information. 
Psycho  logical  Review ,  1956,  6_3,  81-97. 

Muller,  P.  F.,  Sidorsky,  R.  C.  ,  Slivinske,  A.  J .  ,  Allui  si, 
E.  A.,  and  Fitts,  P.  M.  The  symbolic  cobin]  -f 
information  on  cathode  ray  tubes  and  simi.ar  displays. 
USAF,  WADC  Technical  Report  No.  5C,-3J5,  1955. 

Newell,  G.  F.  and  Geddes,  W.  K.  E.  Visibility  of  snail 
luminance  perturbations  in  television  )i splays. 

Proceed  i  ngs  of  the  I  EE,  196  3,  jin,  i  r7’1. 

Prosser,  R.  D.  and  Ai 1 natt,  J.  W.  : abjective  guality  at 
television  pictures  impaired  by  random  no  :•»>. 

Proceedings  of  the  TEE,  1965,  113,  1099-1 ;07. 


Prosser,  R.  D.  ,  Allnatt.,  J.  W.,  and  Lewis,  N.  K.  Quality 
grading  o f  impaired  television  pictures.  Proceed i ngs  o 
the  I  RE ,  1964,  U_l,  491-502.  .  " 

Sadacca,  R.  and  Schwartz,  A.  I.  Ps ychophysi ca 1  aspects  of 
image  qua  1 i ty--ex pi o ra to r y  study.  Washington,  D.C.:  U.S 
Army  Personnel  Research  Office,  Technical  Research  Note 
136,  September,  1963. 

Snyder,  H.L.,  Turpin,  J.A.,  and  Maddox,  M.E.  Quality 

metrics  of  digitally  derived  imagery  and  their  relation 
to  interpreter  performance:  II.  Hard-Copy  Digital  Imagery 
In te r pr e tab i 1 i ty .  Technical  Report  81-3,  Virginia 
Polytechnic  Institute  and  State  University,  Blacksburg, 
VA,  1981. 

Stevens,  S.  S.  Psychophysics.  New  York:  Wi lev  and  Sons, 
1975. 

Torgerson,  W.  Theory  and  methods  of  scaling.  New  York: 
Wiley  and  Sons,  19  58. 

Weaver,  L.  E.  Subjective  impairment  of  television  pictures. 
Electronic  and  Radqo  Engineer i ng ,  19  59,  36 ,  170. 

Weaver,  L.  E.  The  quality  rating  of  color  Television 
pictures.  Jour na 1  of  the  S MPT R ,  1968,  77  ,  6 1 0 - 6 i 2 . 


58 


Appendix  A 


THE  NATO  SCALE  --  AN  TMANK  INTEki-.-. 
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aircraft,  by  tyne,  otrai  iht-wi  m  a.ib  *e;.;  n- 1 1  > 
Hoco'jn  i  zo  ports  and  harbors  (including  iir  :e  sa  I  ps  m  : 
j  rydock s)  . 


SO 


R .1 1  i_n  j  C a  teqo  r  y  3 


Detect  '.msiiini’ations  equipment,  (radio/radar). 

Detect  supply  dunps  (  POL/  o  rd  nance )  . 

Detect  and  count  a 'fjr a r e 1 y  all  stra iqht-wi ng  aircraft,  all 
swept-winq  aircraft,  and  all  delta-wing  aircraft. 

Detect  carman  i  and  control  headquarters. 

Detect  stir  f ace- t o-sar face  and  surface-to-air  missile  sites 
(including  vehicles  and  other  pieces  of  equipment)  . 

Detect  land  minefields. 

Recounize  bridges. 

Recognize  surface  snips  (distinguish  between  a  cruiser  and 
i  destroyer  by  relative  size  and  hull  shape) . 

Reciq.nize  c  ,ast  and  landing  beaches. 

Recoqnize  railroad  yards  and  shops. 

Re-oqnize  surfaced  submarines. 

Identify  airfield  facilities. 

Identify  urban  ureas. 

Identify  terrain. 

Ra  t  i  nq  Cat ejjo  r  y  4 

rv<ets  and  artillery. 

Re  :o  gn ; ze  troop  units. 

Rec  >gnize  1 1  r craft  'such  as  FAGOT./M IDGBT  when  singly 
i  •-*  i ' '/e  i;  . 

Recoqnize  missile  si tes  tSSM/SAM) .  Distinquish  between 
missile  types  by  the  presence  and  relative  position 
of  wings  and  control  fins. 

Recoqnize  nuclear  weapon',  components. 

Recoqnize  land  minefields. 

identify  ports  and  harbors. 

Identify  railroad  yards  and  shops. 

Identify  trucks  at  ground  force  installations  as  cargo, 
f 1  a  tbed ,  or  van . 

Identify  a  KRESTA  by  the  helicopter  platform  flush  with  the 
funtail,  a  KRESTA  II  by  the  raised  helicopter  platform 
(one  deck  level  above  fantail  and  flush  with  the  main 
deck )  . 
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Rating  Category  5 

Detect  the  presence  of  call  letters  or  numbers  anil 

alphabetical  country  designator  on  the  wings  of  large 
commercial  or  cargo  aircraft  (where  a i phanumer i cs  are 
three  feet  high  nr  greater). 

Recognize  command  and  control  headquarters. 

Identify  a  singly  deployed  tank  at  a  ground  forces 
installation  as  light  or  med i um/neavy  . 

Perform  Technical  Analysis  (PTA)  on  airfield  facilities. 

PTA  on  urban  areas  and  terrain. 


Ra  ting  Category  6 

Recognize  radio/radar  equipment. 

Recognize  suppiy  dumps  ( POL/ordnance) . 

Recognize  rockets  and  artillery. 

Identify  bridges. 

Identify  troop  units. 

Identify  coast  and  landing  beaches. 

Identify  a  FAGOT  or  MIDGET  by  canopy  configuration  when 
s  i  nq  1  y  deployed. 

Identify  the  ground  force  equipment;  T-54./55  tank,  BTR-50 
armored  personnel  carrier,  5  7  mm  AA  jun. 

Identify  by  type,  RBU  i ns ta 1  la t i ons  (e.g.,  2  50  0  series), 
torpedo  tubes  (e.g.,  21  in™ h/ 53. 34  cm),  and  surface- 
to-air  missile  launchers  on  a  KANIN  DDG, 

KRIVAC  DDGSP,  or  KRESTA  II. 

Identify  a  ROMFO-class  submarine  by  the  presence  of  the 
cowling  for  the  snorkel  induction  and  the  snorkel 
exhaust. 

Identify  a  WHISKEY-class  submarine  by  the  absence  of  the 
cowling  and  exhaust. 

RoJl  i_r.  g  Category  2. 

Identify  radar  equipment. 

Identify  major  electronics  by  type  on  a  KTLDEN  DOGS  or 
KASHIN  DLG. 

Identify  command  and  control  headquarters. 

Identify  nuclear  weapons  components. 

Identify  land  minefields. 

Identify  the  general  configuration  of  an  SSBN/SSGN  submarine 
sail,  to  include  relative  placement  of  bridge  peri- 
scope(s)  and  main  el ec Lr on i cs/nav iua t i on  equipment. 

PTA  on  ports,  harbors,  and  roads. 

PTA  on  railroad  yards  and  shops. 
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Rating  Category  8 


Identify  supply  dumps  (POL/ordnance) . 

Identify  rockets  and  artillery. 

Identify  aircraft. 

Identify  missile  sites  (SSM/SAM) . 

Identify  surface  ships. 

Identify  vehicles. 

Identify  surfaced  submarines  (including  components  such  as 
ECHO  II  SSGN  sail  missile  launcher  elevator  guide  and 
major  electronics/navigation  equipment  by  type). 

Identify,  on  a  KRESTA  II,  the  configuration  of  the  major 

components  of  larger  electronics  equipment  and  smaller 
electronics  by  type. 

Identify  limbs  (arms,  legs)  on  an  individual. 

PTA  on  bridges. 

PTA  on  troop  units. 

PTA  on  coast  and  landing  beaches. 


Ra  t i ng  Ca  tego  r y  9 


Identify  in  detail  the  configuration  of  a  D-30  howitzer 
muzzle  brake. 

Identify  in  detail  on  a  KILDEN  DOGS  the  configuration  of 
torpedo  tubes  and  AA  gun  mountings  (including  gun 
de ta i Is ) . 

Identify  in  detail  the  configuration  of  an  ECHO  II  SSGN 

sail  including  detailed  configuration  of  electronics 
communications  equipment  and  navigation  equipment. 

PTA  on  radio/radar  equipment. 

PTA  on  supply  dumps  ( POL/ordnance) . 

PTA  on  missile  sites. 

PTA  on  nuclear  weapons  components. 
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A  connecting  line 
determined  by  the 
the  p  <  .05  level 


Appendix  B 
SCENE  MCT  RESULTS 


indicates  no  differences  among  means,  as 
Newman  Keuls  multiple  comparisons  test,  at 
of  confidence. 


ORDERED  SCENES 


8169523  10  47 


Appendix  C 

BLUR  X  NOISE  MCT  RESULTS 


A  connecting  line  indicates  no  difference  among  means, 
determined  by  the  Newman-Keuls  multiple  comparison  test, 
the  p  <  .05  level  of  confidence. 


40  pm  BLUR 

ORDERED  SIGNAL-TO-NOISE  RATIOS 
12  24  42  60  75 


52  Pm  BLUR 

ORDERED  SIGNAL-TO-NOISE  RATIOS 
12  24  42  60  75 


84  Pm  BLUR 

ORDERED  SIGNAL-TO-NOISE  RATIOS 
12  24  42  60  75 
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162  „m  BLUR 


ORDERED  SIGNAL-TO- NOISE  RATIOS 
12  24  42  60  75 


3  22  i*m  BLUR 

ORDERED  SIGNAL-TO-NOISE  RATIOS 
12  24  42  60  75 
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Appendix  D 


SCENE  MCT  RESULTS  --  BLUR  LEVEL  40 


A  connecting  line 
determined  by  the 
the  p  <  .05  level 


indicates  no  difference  among  means, 
Newman-Keuls  multiple  comparisons  test, 
of  confidence. 


ORDERED  SCENES 
6185249  10  37 


a  s 
a  t 
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Appendix  E 


SCENE  MCT  RESULTS  —  BLUR  LEVEL  52 

A  connecting  line  indicates  no  difference  among  means,  as 
determined  by  the  Newman-Keuls  multiple  comparisons  test,  at 
the  p  <  .05  level  of  confidence. 

ORDERED  SCENES 
86159234  10  7 


Appendix  F 


SCENE  MCT  RESULTS  —  BLUR  LEVEL  84 

A  connecting  line  indicates  no  difference  among  means,  as 
determined  by  the  Newman-Keuls  multiple  comparisons  test,  at 
the  p  <  .05  level  of  confidence. 

ORDERED  SCENES 
81569  10  2347 
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Appendix  G 

SCENE  MCT  RESULTS  --  BLUR  LEVEL  162 


A  connecting  line  indicates  no  difference  among  means,  as 
determined  by  the  Newman-Keuls  multiple  comparisons  test,  at 
the  p  <  .05  level  of  confidence. 


ORDERED  SCENES 
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Appendix  H 

SCENE  MCT  RESULTS  —  BLUR  LEVEL  322 


A  connecting  line  indicates  no  difference  among  means,  as 
indicated  by  the  Newman-Keuls  multiple  comparisons  test,  at 
the  p  <  .05  level  of  confidence. 
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